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Abstract — Incompleteness of information about monitored 
objects is immanent property of many information processing 
systems. That is why databases with incomplete information 
(DBI) modeling is constantly actual area of the theoretical 
computer science as well as foundation of efficiency of 
DBI-centered software and hardware. This article describes an 
approach to such modeling, the main feature of which is 
representation of the DBI as a finite set of so-called incomplete 
(N-) facts being sentential forms (SF) of context-free grammar, 
which set of rules form metadatabase (MDB). Relation of the 
mutual informativity on the N-facts set is defined, and 
set-theoretical, mathematical and operational semantics of the 
DBI data manipulation language (DML) are described, as well 
as generalized model of the associative storage and access 
supporting N-facts handling. Augmented Post systems being 
knowledge representation model associated with mentioned DBI 
are introduced. 

Index Terms — databases, incomplete information modeling, 
context-free grammar, sentential form, data manipulation 
language, associative storage and access to incomplete 
information, augmented Post systems, logical inference with 
incomplete information. 


I. INTRODUCTION 

Incompleteness of information about the objects of 
control is immanent property of any information processing 
systems, especially those which operate in rapidly changing 
conditions requiring regularly modified (corrected) 
processing logic. Such systems are usually implemented 
within the network-centric paradigm [1], [2], and their control 
centers are processing flows of information from various 
sensors located in various areas and monitoring objects in 
various physical fields. Any sensor because of its location and 
technical capabilities can precisely recognize only a small 
part of the mentioned objects parameters, so fragments of 
information, which are transmitted from sensors to control 
centers, are principally incomplete. That’s why databases, 
which are maintained and used by such systems and 
accumulate information about the state of the monitored 
objects, must be initially oriented to the entering, integration, 
storage, and update of incomplete information as usual. Such 
databases are called databases with incomplete information 
and may be used online (OLAP) as well as offline (Data 
Mining) - the last for actualization of the processing logics of 
the first. Let’s consider two examples to illustrate the said 
above. 

One of the most actual DBI applications during last time 
becomes Deep Packet Inspection/Deep Packet Processing 
(DPI/DPP) [3]—[5], which main task is early recognition of 
signatures of cyberattacks preparation and performance, by 
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OLAP of computer networks perimeters traffic. The analyzed 
packets contain enciphered structural components usually, so 
both online and offline analysis of such traffic is based on the 
accumulation, update, and logical analysis of the stored 
incomplete information, i.e. DBI handling. DPI/DPP is core 
technology for Intrusion Detection/Prevention, Data Leakage 
Prevention as well as other essential cybersecurity 
technologies [6]-[10]. 

The second example of DBI use are systems monitoring 
real car traffic and revealing autos, which are on the police 
register by reasons of the road accidents, unpaid penalties, 
stealing and so on. This function implementation is based on 
the processing of autos sign images, which are transmitted to 
the processing centers from the video cameras located along 
the roads and the streets. By reasons of the bad weather, fog, 
symbols distortion (accidental or criminal), non-optimal 
mutual location of car and video camera, some structural 
components of the autos signs may be indefinite. That’s why 
entering information flow contains incomplete information, 
and databases, which accumulate such information, are DBI. 

Examples may follow, but there is a reason to pay attention 
to one more very essential obstacle. The information 
fragments, which enter processing, storage, and update in 
DBI, in general case are strings of symbols (bits) of the 
unpredicted structure. This structure partly is known before 
OLAP implementing and usually is corrected while Data 
Mining. This obstacle makes extremely sophisticated, if 
implementable, such fragments processing and accumulating 
by relational/post-relational DBMS-centered software as 
well as software based on the standard relational toolkits, 
which theoretical kernel are indefinite (“null”, “unknown”, 
“imprecise”, “disjunctive”, ’’probabilistic”, “possibilistic” 
etc.) values of the attributes in the records (elements of 
relations) [11]—[24]. 

In these conditions the alternative approach to DBI 
modeling is necessary. This approach must comprehend the 
listed and many similar applications, and create theoretical 
basis for sophisticated analytical processing with dynamic, 
time-varying logic. 

Article presented is dedicated to one of such approaches 
worked out within the so-called “Set-of-Strings” Framework 
(SSF). This framework is not model nor metamodel of data; 
rather it is a set of basic representations, mathematical 
relations and algorithms for strict formal description of the 
key aspects of the data and knowledge processing in various 
environments and modes (OLAP as well as Data Mining). 
SSF was suggested and developed by the author in [25]-[28], 
and during the past time have been soft- and 
hardware-implemented in various areas, one of the up-to-date 
being mentioned cyber security. 

SSF kernel is integration of the best features of classical 
string-generating formal grammars proposed by N.Chomsky 
for natural language syntax description [29] and widely used 
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in modern computer science [30], as well as Post systems 
proposed by E.Post in [31]. The last are interpreted lower as 
string-operating logical calculus similar to Horn clauses 
well-known now due to its procedural interpretation [32], 
[33], which had lead to Prolog programming language and its 
various dialects [34]-[38]. 

By reason that SSF isn’t well-known to wide circle of 
specialists, the second section of the article is dedicated to the 
short description of its main elements. In the third section the 
basics of information incompleteness modeling within the 
SSF are described. The fourth section is dedicated to the key 
elements of the incomplete (N-facts) fusion. Main content of 
the fifth section is N-facts processing algorithmics, while the 
sixth section is dedicated to DBI associative storage modeling 
and redundant search elimination while access operations 
execution. The seventh section describes augmented Post 
systems (APS) and incomplete information modeling in the 
frames of this knowledge representation. 


II. BASIC ELEMENTS OF THE SET-OF-STRINGS 

FRAMEWORK 

The starting point of the SSF is the representation of the 
database as a finite set of strings (we shall call it 
“set-of-strings DB”, or SDB): 


— JWl,-' ■■■ l== ^ ■ 



where W t means DB at the discrete time moment £, and ” L is 
a set of all strings in the initial (terminal) alphabet V. The 
structure of the DB elements u\ E W t named facts is 
determined by metadatabase, which is denoted by D t and will 
be considered lower. 

The couple 


M-semantics corresponds to S-semantics, but is described 
up to MDB and DSML models, which are based on some 
well-known and understandable mathematical constructions. 

O-semantics corresponds to M-semantics, but unlike the 
last it is defined algorithmically. Algorithm of DSML 
messages processing should produce result by the finite 
number of steps in full accordance with expressions of 
M-semantics, or detect the impossibility of it due to the errors 
in the access message. 

I-semantics is based on O-semantics and corresponds to the 
last one in the matter of the result of access message 
interpretation. However, kernel of O-semantics definition are 
considerations of algorithmical solvability; at the same time, 
while defining I-semantics, special attention is given to 
efficient implementation, i.e. minimization of average time of 
processing access messages (or, at more abstract level, 
minimization of computational complexity of the 
corresponding algorithm). 

Consider set-theoretical semantics of the DSML 
sublanguage providing access to DB, which, according to the 
canonical data bases terminology, will be addressed lower as 
data manipulation language (DML). Basic DML S-semantics 
supposition is that content (substantial component) of the 
access message iv r is finite description of the set '= 
which corresponds to the user’s knowledge about problem 
area segment he (or she) is interested (is aware of). DML 
S-semantics expressions connect W t+i (database after 
operation execution) and ,4 t+1 (answer to the access) with the 
set If and set "E r (database at the moment of access to DB). 
Starting from the ordinary sense of the operations over DB, 
we have following expressions (basic SSF equations): 

(4) 

= W r+1 - (5) 


B t =<W tf D r > (2) 

is named data storage (DS). Data storage is in the correct 
state, if W t E W f (D r j, where H r (D r j is a set of all correct 
databases, defined by the MDB. 

Access message to DS is the triple 

=< o,c f x >, ( 3 ) 

where o is operation, which execution is the purpose of the 
access (insert, delete, update, query), c - DS component (DB, 
MDB), and x - content of access, i.e. query body, or 
elements (facts), which are inserted or deleted. It is supposed, 
that the answer (reply) to the access is obtained by the user at 
the moment t -f 1 next to t , and when c = DO, it is the finite 
setiV+i (in the case when c = MDB the answer is denoted 
Af +1 , and it is also finite set). 

All possible triples of the form (3) create the DS 
manipulation language (DSML). 

There are four types of semantics of the DSML within SSF: 
set-theoretical (S), mathematical (M), operational (O) and 
implementational (I). 

S-semantics is formal definition of functions of the DBMS, 
which is processing accesses to DS, in the set theory language 
and is invariant to the specific DSMLs. It is used as a 
framework for development of DSML as such, i.e. its 
sentences. 


for insertion (speaking more precisely, inclusion), 

«4 + i = - It, (6) 

-V + i = W t - IV t+1 , (7) 

for deletion (exclusion), and 

lV t+1 =IV f , (8) 

,4 r+1 = w t n ; t , (9) 

for query (everywhere “ — ” is set-theoretical subtraction). 
In the expression (4) / t postulates facts, which are actual in 
the problem area since moment t (here for W t finiteness / t 
must be finite too). In the expression (5) the answer contains 
“new” facts which “extend” DB W t (facts, which already took 
place in the DB, are not included to,4 t + 1 ). In (7), on the 
contrary, facts, which are really excluded from DB, take place 
in the answer: postulates facts, which are not actual since 
moment l in the problem area. Expression (8) reflects that DB 
remains constant after the query processing. The / t set in (9) 
contains all facts, which actuality check is the purpose of the 
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query. At all the equations fact w once included to DB is 
staying actual until it is deleted from the database. In the 
expression (6) deletion mentioned takes place. Both in (6) and 
(9) / t may be infinite. 

Example 1. Consider database accumulating data from the 
distributed network of the ecological sensors, which are 
monitoring areas, where they are located. Let 

W t = {AREA GREEN FOREST NORMAL IS «00, 

AREA LITTLE LAKE NORMAL AT 15.03, (10) 

AREA HIGHLAND SMOKED AT 15.10}, 

that means two monitored areas are at normal state while one 
area is smoked. (Pay attention to the free structure of facts and 
arbitrary natural language use due to the basic “set of strings” 
assumption). If there occurs information from the sensor 
monitoring green forest, that at 15.10 this area is also smoked, 
inclusion 


W f+ 1 = Wt 

U {AREA GREEN FOR ESI SMOKED ^715.10} 


( 11 ) 


is executed. When at this moment £ 4- 1 user accesses DB 
with query, which purpose is to get information about all 
smoked areas, the infinite set / t may be as following: 


I t+1 = {AREA A SMOKED AT 00.00 . .. ., 
AREA A SMOKED AT 23.59, ..... 

AREA AA SMOKED AT 00.00 . 

AREA -L.4 SMOKED AT 23.59.. .... 

AREA Z SMOKED AT 00.00 ... 

AREA Z SMOKED AT 23.59. ... } 


AT 15.10" ALREAD Y PRESENTS IN DA TA3ASE : 

FA CT 'A REA HIGHLAND SMOKED AT 15.30” 

IS INCLUDED TO DA TABASE} . 

Note also, that equations (4)-(9) correspond mostly to the 
closed world interpretation, which postulates that the absence 
of the fact in the DB is equivalent to its absence in the reality. 

While accesses to the DB metadatabase remains constant, 
i.e. 

Let’s consider DML mathematical and operational 
semantics. 

DML M-semantics is based on the MDB D t representation 
as set of generating rules of the context-free (CF) grammar in 
the sense of N.Chomsky [29]. Every such rule is the 
construction a -* $, where c? is the non-terminal symbol 
(“non-terminal” shortly) while is the string containing 
symbols of terminal alphabet V and non-terminals, which are 
the names of the structural components of the facts w E W t . 
The initial non-terminal (“axiom” in N. Chomsky 
terminology) of the context-free grammar G r , which set of the 
rules (“scheme”) is D r , is denoted as g Ci and from the 
substantial point of view is “fact”. In this frames 


G- —' =: - . ,-.Vf. 


where 


N t = {a | a -> (3 e /) f } 


(18) 


(19) 


( 12 ) 


is set of the non-terminals (“non-terminal alphabet”) of the CF 
grammar G r . 

Database W t is named correct to the metadatabase D r if 


w t C L(G { 0 


( 20 ) 


The answer to the query is 

4+2 ~ ^t+i n ^r+1 = 

= {AREA HIGHLAND SMOKED AT (E3J0 ; 

AREA GREEN FOREST SMOKED AT 15.3 0}. 

In the expression (12) names of areas are all strings in the 
alphabet l P = {A. .... Z. 0. ... .9. . . }, so 

/ t+1 = {AREA} ■ F ■ [SMOKED AT} ■ [00 . 23} ■ U [00, . 59} ■ (14) 

Note that definitions (4)-(9) are not unique. For example, 
in the inclusion definition elements of / r set, having place in 
the DB at moment t. may be included to the answer 


i.e. facts, having place in the DB, are words of the CF 
language E(G t ). In other notation, 


(Vu- E W t ’) a, D ==* w. 


( 21 ) 


r +1 — ^ 

as well as the answer may be defined as 

= U^ACT"} ■ {W t fl J t ) 

■ {"ALREADY PRESENTS IN DA Li BASE} 
U{FACT"}(W t+1 -W t ) 

-{'IS INCLUDED TO DATABASE }. 


(15) 


(16) 


So, according to (16), the answer to the access may be as 
follows: 

4 +1 _ = {FACT AREA GREEN FOREST SMOKED (17) 


i.e. any fact u- in the DB \V t is generated by (derived in) the 

* 

CF grammar . Notation =* is used to show generativity (or 

derivability) of one string in alphabet V U A r t from another. 

Example 2. Consider metadatabase D r containing 
following rules (non-terminals are in the angle brackets as in 
Backus-Naur notation): 

< foci AREA < name of ar&a >< ziate > 

AT < turn >, 

< nams of area > < text >, 

< state >-i NORMAL , 

< state >-s SMOKED , 

< fzTTK >-* < hours >. < minutes >, 

< hours >-*< 0 to 1 >< 0 to 9 >, 

< hours >-i 2 < 0 to 3 >, 

< 0 to 1 >-> 0, 

< 0 to 1 >-* 1, 

< 0 to 9 >-* 0. 
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■ i ■ 

< 0 to 9 >-* 9, 

< 0 f o 3 >-* 0, 

■ n i 

< 0 t o 3 >-* 3, 

< minutes >-*< 0 to 5 >< 0 to 9 >, 

< 0 f o 5 >-* 0, 

■ ■ j 

< 0 to 5 >-* 5, 

< text >-*< symbol >, 

< text >-*< svmbol >< text >, 

< symbol >-> A, 

b* " 

■ ■■ 

< JYTM&g/ >-* Z , 

b" 7 

< >-* 0 , 

b" ' 

■ n i 

< SVJ72&.S/ >-i 9, 

b" " 

< symbol >-* . 

b" u 

Database 

W t = {AREA AW SMOKED AT IS. 10, 
zLEZz! Z NORMAL AT 23.59} 

is correct to this MDB, while database 
W t = {AREA A W NORMAL} 

is incorrect. ■ 

As may be seen, the classical formal grammars suggested 
by N.Chomsky as a tool for description of syntax structures of 
the sentences of natural languages [29] are used here in such a 
way, that the structures of DB facts are described by the 
metadatabase, which itself is the rules set of CF grammar. In 
this application CF grammars are simple, understandable and 
quite general model of data description language (DDL). Note 
also, that MDB itself is “set-of-strings” database, which 
elements are string representations of the rules ge -* (the 
only difference is that its alphabet includes angle brackets and 
arrow additionally to V). This unifies DSML sublanguages 
for DB and MDB. 

As to the data manipulation language, we shell use 
representation of the substantial part x of any access message 
CrJ t =< o, DB,x >, as a sentential form of grammar G r , so 
that 

4 

I t = {w I x =? u- El u- G V 4 } (22) 

4 

4 

(for short, => is used instead of 

The set of all sentential forms of the grammar fj r will be 
denoted lower as SF(G t ): 

SF(G t } = {;v I a 0 => y}. (23) 


As may be seen, the answer to the query with substantial 
part x G SF(G t ) is set of strings of the language L(G t ) , 
derived from the sentential form x and being facts of DB W t : 

A r ^ 1 = {w | w G W t & x => w}. (24) 

For the DB correctness, inclusion definition is rewritten in 
a following way: 

W t+ i = W t U [w \w G I t Sl w G fi(C7 r )}, (25) 

while deletion by the access W t =< delete, DB,.r > is 
defined as 

H4+i — W t — [w 1 x =? w El u- g W t J. (26) 

Example 3. The substantial part of query to the DB from 
examples 1,2, which purpose is information about smoked 
areas, looks like 

x = AREA < name of area > SMOKED AT < turn >.■ 

It is obvious, that “SF-like” query language do not express 
selection conditions which define boundary restrictions for 
numeric fields of records in usual relational databases. 
However, this necessary feature may be added easily to this 
language by means of ordering set of rules with the similar left 
part (non-terminal) in a way described in [28]. This allows to 
define any necessary linear order on the set of words 
generated from every non-terminal of the CF grammar and to 
use such orders for selecting facts from the databases. 

More sophisticated queries may be constructed by boolean 
expressions of the atomary units x G SF(Gf) , while 
relationally complete query languages implementation needs 
transfer to the knowledge segment of the SSF based on the so 
called augmented Post systems [25], [28]. 

Relational databases SSF representation and operation 
techniques, as well as pre- and post-relational DB modeling, 
are described in details in [28]. 

Operational semantics of the DML is quite evident: it is 
based on the sequential sorting of facts u- G W t with the 

4 

check of X => w derivability, which is algorithmically 
solvable for CF grammars. 

Various types of implementational semantics correspond to 
various fixed types of CF grammars (for example, so-called 
page databases with key access which support hypertext 
storage and online processing [26], [28] and are SSF unified 
representation of the today web-interface portals and 
twitter-like databases). 

III. SET-OF-STRINGS DATABASES WITH 
INCOMPLETE INFORMATION 

For the limited size of the article let us begin the 
consideration of the SSF modeling of databases with 
incomplete information directly from the mathematical 
semantics of the corresponding data manipulation language 
(denoted as DMLI). 

DMLI is based on the same MDB representation in the 
form of the D t set of the context-free grammar rules a -* 
Database with incomplete information denoted as is finite 
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set of CF grammar G r sentential forms, called incomplete 
(N-) facts in this application: 

— {x 1# ... (27) 

Example 4. Consider MDB from the example 2, and 
corresponding DBI 

X t = {AREA L ONEL YTREES NOR MAL A T 12 . 0 1, 

AREA LONELYTREES < state > AT 13. < minutes >, 
AREA < name of area > SMOKED A T 14.3 0}. 

As seen, DBI contains three N-facts, first of which is usual 
fact in the sense (4)-(9). Second N-fact corresponds to the 
information about the same object of monitoring but with 
unknown state of area and time, which is in the interval 
13.01-13.59. Third N-fact contains information about 
unknown area, which was smoked at 14.30. One may suppose, 
that two last N-facts inclusion to DBI is consequence of the 
sensors and/or communication hardware malfunction. ■ 
Consider equations describing DBI update and query 
processing. 

We shell suppose without loss of generality that 
CF-grammar G r corresponding to the MDB D r is acyclic and 
unambiguous in the sense [30]. Under these suppositions set 

4 

SF(G t ) is partly ordered by the relation =* : it is reflexive 

4 4 4 

( x=$x ), antisymmetric (when x=$y and y=>x then x = 3 ') 

G r G r Gf 

4 4 4 

and transitive (if x=$y and y=$z then x=$z j. 

G t 

There is maximal element of the set SF(G t ) - it is axiom 

4 

c? n (“fact”) because for every E 5F(f7 r J c?=Lx. For every 
subset .V '= Sr i.G t ) there exists set of its upper bounds - 

4 

sentential forms (“N-facts”) y E SF(G t ) such that y r=i >x for 

(h 

all x E X, and minimal (least) upper bound sup X such that 

for every other upper bound y from the mentioned set the 
4 

relation y=> sup X is true. For some X Q SF (G r J there may 

G t 

exist set of lower bounds - sentential forms (“N-facts”) 

4 

y e SF(.G Z ) such that x=h? for all x t X - and maximal 

Gr 

lower bound inf,V such that for every other lower bound 
4 

inf A =^y is true. 

G r 

Example 5. For DBI from the example 4 inf X t does not 
exist, but 

s up X- = AREA < name of area >< state > 

AT 1 < 0 to 9 > . < minutes > . 

At the same time 

inf {AREA LONELY TREES < state > AT 12,30, 


AREA < name of area > SMOKED A T 
12, < minutes >} 

= AREA LONELY TREES SMOKED AT 12,30. ■ 
Starting from the said higher, we shall follow the 

4 

substantial interpretation of the relation =$ of the mutual 

Gf 

derivability of the sentential forms of the G r CF grammar as 
the relation of the mutual informativity of incomplete 

4 

facts x E SF(G r J. In the frames of this interpretation x=^x l 

G t 

means that N-fact x T is not less informative in comparison 

F 

with N-fact x (ifx^r 1 thenx r is more informative and is 

called concretization of x). If A != X tf then sup X is N-fact, 
which is maximally informative N-fact in comparison with all 
N-facts, which are less informative, than all N-facts from the 
set X. At the same time N-fact inf A, if it exists, is minimally 
informative N-fact in comparison with all N-facts, which are 
more informative, than all N-facts from the set A'. (This 
interpretation fits to A. Kolmogorov’s algorithmic theory of 
information basic postulates, i.e. constructive objects mutual 
complexity [39]). 

Let us consider DBI X t = ^ . Q SFiGf] , in 

relation to which it is reasonable to suppose that it contains 
maximally informative N-facts, which only may be acquired 
by the system. In this case existence of N-fact x E X t 

4 

excludes existence of N-fact x 1 E X t ., such that r= *x , 

Gt 

because, obviously, x r is redundant here. We shall call DBI 
X t , which does not contain such redundant N-facts, 
“non-redundant DBI”. We shall consider only non-redundant 
databases with incomplete information lower, if the contrary 
is not said specially. 

After this assumption M-semantics of the non-redundant 
DBI update (or result of N-fact x E SF(.G r ) inclusion) is 
described by the following expression: 

4 4 

X t+1 = X t U {x} - {y | y e X, & (x^y V y : =*)}. (28) 

Gr G r 

In other words, inclusion of N-fact x to DBI results in 
deletion from DBI all N-facts more or less informative in 
comparison with x . Both are necessary for DBI 
non-redundancy maintaining. The “novelty” of N-fact x , 
which reflects state of problem area at the last moment L 
serves the grounds for storing x, while N-facts “compatible 
by informativity” with x eliminating from DBI. The answer to 
N-fact inclusion to DBI must contain N-facts eliminated from 
the database (more or less informative in comparison with x): 

.4 r+1 = (29) 

Example 6. If N-fact x = AREA GREEN FORES! 
.S_l-/O^ZD^ri4.30 is included to DBI X r from the example 
4, then 

X t + L = {AREA LONELY TREES NORMAL AT 12,01, 
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AREA LONELY TREES < state > 

AT 13 „ < minutes >, 

AREA GREEN FOREST SMOKED A T 14.3 0}, 

because 

* 

AREA < name of area > SMOKED AT 14.30 
AREA GREEN FOREST SMOKED A T 14 . 3 0. ■ 

As to query M-semantics, there may by two different 
versions, which run out of the new DBI features in 
comparison with DB. 

The first version is direct generalization of (24): 

'^+1 = {X I * e Xt & y =>x}, (30) 

where A* +1 is the answer to the query with content y. As 
seen, all N-facts from the DBI X t , which are no less 
informative than x , are included to the answer. 

The second version is more precise and practically useful. 
It is based on the purpose of the query as check of the 
possibility of facts more informative, than N-facts having 
place in the DBI X t . Returning to S-semantics, we may 
postulate that DBI X t contains N-fact x , and the query 
content y is such, that 

* * 

f 3u- G l(ij r 3) x=*\v &y=* vr, (31) 

Gt Gt 

so, while v is not concretization of y, it is sensible to include 
x to the answer because there may be facts w G which 

are both x and y concretizations. The set of such facts is 
intersection W x n W r , where 

* 

w x = {w I x=>w}. (32) 

Gt 

w y . = {w|y=>w}. (33) 

Gt 

The intersection finite representation is, obviously, 
maximal lower bound of the [x, y] set. For this reason, answer 
to the query with y content may be the set 

aJ +l = [x I x G X z Si 3 inf{x, y}), (34) 

and, as alternative, 

Ai +1 = y} \ x E X t & 3inf[x,y}}. (35) 

Example 7. Consider DBI X t from the example 4, and the 
query with content 

y = AREA < name of area > SMOKED AT < time > 
Purpose of this query is to get information about all areas 
smoked. According to (30), (34) and (35) 

= £AREA < name of area > SMOKED AT 14.30}, 

A J +1 = [AREA LONELY TREES < state > AT 
13. < mi mites 


AREA < name of area > SMOKED AT 14.30}, 

A* +1 = { AREA LONELY TREES SMOKED AT 
13 . < minutes >, 

AREA < name of area > SMOKED AT 14.30}. ■ 

From here, possessing minimal toolkit for maintaining and 
processing databases with incomplete information in the 
“Set-of-Strings” Framework, we may consider some key 
aspects of N-facts fusion. 

IV. INCOMPLETE FACTS FUSION 

N-facts fusion may be implemented in two versions: 
preprocessing and postprocessing. 

In the first case (while OLAP usually) the “assembling” of 
the fragmentary data entering from various sensors is 
executed in order to create maximally informative N-fact, 
which is directed to the following processing. Every of the 
mentioned sensors surveys the monitored objects in such a 
way, that one part of their state parameters is known more 
precisely while another less precisely. The “assembled” 
N-fact must contain all information having place in the 
entering N-facts in the supposition they do not contradict one 
another. If the last occurs, then some extra processing is 
needed. 

At the second case (while Data Mining usually) DBI subset 
selected by the query is checked for the possibility of the 
maximally informative N-fact “assembled” from N-facts 
selected. This N-fact, if it exists, accumulates all information, 
which is contained in the mentioned N-facts selected. 

Consider both cases. 

Preprocessing. Let us have the set of N-facts 
-V = G SF(G r ]\ , corresponding to one and the 

same monitored object. This set at the current step of the 
OLAP system operating process enters this system. From the 
substantial point of view it is obvious, that N-fact containing 
all information, which is contained in all N-facts from this set, 
is x — inf X, if maximal lower bound of the set X does exist. 

Set X in this case is named “non-contradictory” (if inf X does 
not exist - respectively, “contradictory”). Various aspects of 
the contradictory sets of N-facts processing are object of the 
separate publication. From the general point of view, it’s 
obvious that if contradictions exist, the maximum one can 
make while “assembling”, is to include to the resulting N-fact 
~ all fragments of information from N-facts , 

which do not contradict one another. Formally, in this case 

x — s up X. 

Example 8 . Consider X = where 

x i = CAR < Ford or Bentley > COLO US 

< white or gray > NUMBER MN < l ></></ >, 
.r : = CAR FORD COLOUR WHITE NUMBER <l> 

< l >< l >< f > 6 

x 2 = CAR FORD COLOUR < colour > NUMBER < 
l > NX1 <f> 

x A = CAR < brand> COLOUR < colour > 

NUMBER M < l > X< f >< f > 
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Metadatabase relating this set of N-facts include following MNX < f > 6 , 

rules: 


< fact >CAR < brand > COLO UR < colour > that 
NUMBER < l >< l >< l >< fxf >, 


< brand > -+ < Ford or Bent lev >, 

< brand > -+ < 5A/IF or Audi >, 

< Ford or Beni lev >-* FORD , 

< Ford or Bentlev >-* BENTLEY. 

L 1 " 

< BMW or A udi >-* 5_WTF, 

< ^A/FTojrjlirrfi >-* AUDI, 

< colour >-*< wAzte or gray >, 

< >-*< >, 

< white or gray WHITE , 

< wAiteor^ray >-* 

< tfroH'jz >-* BLACK, 

< black or brown >-* 

< E >-* A, 

■ n i 

< i >-* z, 

< / >^ 0 , 

■ II I 

< / 9. 

The set contains N-facts obtained from interrogation of 
witnesses to a road traffic accident, committed by car driver 
fled the scene. The first witness (xj asserts that car was 
white or gray, Ford or Bentley, while alphabetical part of its 
number begun from M and N letters. The second witness 
(x 3 ) thinks, that car was white Ford, and the last figure of its 
number is 6 . The third witness (x g ) saw, that car was Ford, 
but two last letters of the alphabetic part of its number where 
A : and X while the first figure was 1. The fourth witness (x 4 j 
knows neither car brand, nor its colour, but he is sure the 
alphabetic part of car number begins from M and ends by X. 
The result of preprocessing is 

inf A' = CAR FORD COLOUR WHITE NUMBER MNX T6. 


corresponds to the white car of unknown brand, 
alphabetic part MNX and numeric part of the car number with 
last figure 6 and first unknown. ■ 

The described key elements of the N-facts fusion are based 
on the creation of minimal upper and maximal lower bounds 
of “two-N-facts” set. Consider algorithmics of these 
operations. It is kernel of the DMLI operational semantics. 

V. KEY ALGORITHMS OF INCOMPLETE FACTS SETS 

PROCESSING 

Let us begin from formulating criterion for the recognition 
if inf[x, y] exists, where x,y E SF(G') , and G is 
unambiguous and acyclic CF grammar. 

Theorem 1 [25]. Let* andy are sentential forms of the 
unambiguous and acyclic CF grammar G —< 7,N,.ct Zlf R >. 
Then inf{x, y] exists if and only if sup {x, does not contain 
non-terminal cr E A : such, that s up [x, y j = z t az 2 , where 
z 1 # z 2 E (VUNY , and R includes two rules g jt? and 
g -* jff 1 such, that 

P * (36) 

■ 

=> x, (37) 

Zi/?'zj =f y.m (38) 

This criterion may be reformulated to the simpler one, 
which is more convenient for the practical use. 

Let sup [x, y} = w L a i± w 2 ... a if ... w n a in w m+ L , where 

u E V*,a- v . E A r . Let us denote A x set of non-terminals 

numbers, as they take place in SF sup [. x , y}, such, that / E A x 

, if 


That means accident probably committed by the driver of the 
white Ford number MNX 16. ■ 

Postprocessing. Independently from the M-semantics 
version (30), (34) or (35), reply to y query to DBI is some set 
A = [ri,.,,, x m ] of N-facts. To “assemble” from this set 
maximally informative N-fact, which contains all 
information, which takes place in N-facts from the A set, it is 
sufficient to create N-fact x = inf A. If x does exist (A is 
non-contradictory), then x itself is necessary N-fact. 

Otherwise, i.e. if A is contradictory, according to the 
preprocessing mode, it is sufficient to create N-fact 
x = sup A , which contains all information, that is “common” 
to all N-facts, which belong to A . 

Example 9. Consider set X - where 

x 1 ,r 2 ,;cg are from the example 8 , and 

x 4 = CAR BENTLEY COLOUR < colour > NUMBER M 
< l >< l > 46 . 

As seen, inf,4 does not exist, while 


X = X 1 G !| X 2 , 

(39) 

H 

(40) 

«5+L a 'E /+ 1 W5+a +1 ^*2 

(41) 

In other words, sup fx, y} and x are 

“equiinformative” “in 


the sense” of /-th non-terminal in SF sup[x, y], because this 
non-terminal isn’t used in the generation (derivation) of the 
SF x from supfx, y}. A x is the set of all such numbers. By 
analogy A v is defined. 

Example 10. As may be seen from the example 8 , 


s up [x, _y} = CAR < Ford or Bentley > COLOUR 
< white or gray > NUMBER <1 ><l ><l > 

</></>, 

= {LZS,6,7},^ = {3,4,5,6}.b 

The practical criterion for infix, y} existence recognition 
gives the following theorem. 


Theorem 2 [25]. Let 

sup X = CAR < brand > COLOUR WHITE NUMBER 


96 


www.ijeas.org 




Data and Knowledge Bases with Incomplete Information in a “Set of Strings” Framework 


suptr, y} =Hr 1 ar ii w' I m+1 . 

Then infix.. y] exists if and only if A x U A y = [1 , , m] . ■ 

In the example 10 A x U A y = {1,...,7}, so inf[r 1 ..x 2 } 
does exist. 

The correct algorithm creating inftc.yL when it exists, is 
the following. 

INF: function (x,y) returns (); 
z:= SUP(xy); 
do / = 1, 

if / G ,4x(z); 

then substitute to z instead of a- v . string y‘, which is 

derived from a- v . in derivation z =? y; 

else substitute to z instead of cq. string x r , which is 

■ 

derived from oq in derivation z =?■ x; 

end; 

return (z); 
end INF 

In the body of INF variables x and y are input sentential 
forms (N-facts), Ax(z) is denotation of set A x of the SF 
z = sup [x f y} , while M(zj is set of non-terminal position 
numbers in the SF z. Function SUP provides derivation of 
sup {x f y }: 

SUP: function (x, y ) returns (); 
return (SUP#(x, y , or-)) ; 

SUP#: function (x,y, t ) returns (); 
if 

(3g E (T U NY)(3b G (V U NYXSa - 

jB E if) 

u ii 

t — aab 8 l 8b => x 8c afib =?y 
then return (SUP#(x, y, aGb )'); 
else return (t); 
end SUP#; 
end SUP 

As seen from the subfunction SUP# body, all N-facts, 
which are input parameters in the initial (from the function 
SUP body) and following calls, are upper bounds of the set 
[x, y], and that N-fact, which cannot be “concreted” “to the x 
and y directions 4 * simultaneously, is the minimal upper bound 
of this set. 

Graphical illustration of the essence of described 
algorithms is at Fig. la. Here 

X = ... W[ffj-U,- +1 ... 

x 1 = u L a ii u 2 ...Uiu[u i+1 
so supbr, X n s =u 1 flj 1 u 2 

while at Fig. 1 b i nftsq x } = u 1 u i u 2 ...uiu { u i:+i ...u ia+1 , 
where uq G ( V U A' J “ , u G (V U ±V) + i a j. G A. 

As one can see, due to =?= at Fig. lainft-f, -f'} does 
not exist. On the contrary, due to the lack of such at 

Fig. lb infix., x 1 } does exist. 


u 2 ... u t tij. u i+1 ... u r _ iA r u n+1 



(b) 


Fig. 1. Graphical illustration: (a) sup{x, x' (b) influx, x'}. 

Example 11. Consider x 3 and x g from example 8. By 
direct execution of SUP and INF functions 

sup [ x : ., x g } = car ford colour < 

colour > NUMBER < l >< l >< l >< f > 

<f> 

infix 2 ,x g } = CAR FORD COLOUR WHITE NUMBER 

< l > NX16. ■ 

Thus, we have algorithmics for two-element sets of N-facts 
processing. Consider main features of the similar algorithmics 
for set consisting of arbitrary number of N-facts. 

Let X = G 5F(£7), where m > 2. 

Lemma 1. For every sequence < such that 

0"i ■ ■ ■ ■■ fm.} ■■■ « fTZ-} , 

sup A' = supjx.ysupjx^ . sup [x ^ , x (42) 

and, if inf X exists, 

inf X — inf[x i± f inf[x Ei . inf (x^ l , x !r J (43) 

Proof. As shown higher, 5F(t7) is partially ordered set. 
For this kind of sets operations sup A r and inf X, where X is 
finite subset of SF (O ') , are commutative and associative, 
that’s why (42) and (43) are true. ■ 

The main problem of the described approach 
implementation is minimization of the redundant search while 
manipulating DBI. Let us look through basic elements of this 
problem solving techniques. 

VI. BASIC ELEMENTS OF THE SET-OF-STRINGS 
ASSOCIATIVE STORAGE AND ACCESS 

The main idea of eliminating redundant search in the 
set-of-strings databases (including databases with incomplete 


97 


www.ijeas.org 

































International Journal of Engineering and Applied Sciences (IJEAS) 

ISSN: 2394-3661, Volume-3, Issue-8, August 2016 


information) is to use so-called SF-tree, which structure is 
induced by the mutual informativity of N-facts relation. 

Let us begin from database d r c 7", which metadatabase 
is D (index t for simplicity is removed), and 
G =< V f N, G-..D >. 

SF-tree of database W is denoted r(W"\ and possesses the 
following features: 

1) root of the tree is ce c ; 

2) terminal nodes are facts w G W; 

3) internal (non-terminal) nodes are sentential forms of 
G grammar; 

4) for each non-terminal node x G SF(G') and its every 

F 

i i 

closest descendant (son) x condition x=>x is true; 

G 

5) number of sons of every non-terminal node does not 
exceed value 

m = IF m ax{l I {ct -* ..., a -* ft} '= D}, (44) 

~ .'1 ' ^ 

i.e. maximal number of alternatives of non-terminal 
symbols plus one (by transforming scheme D to bialternative 
mode [27], [28] one may fix m = 3). 

The opportunity of the search reduction by SF-tree t( W") 
runs out from the following theorem. 

Theorem 3 [28]. If CF grammar G =< E, N, D > is 
unambiguous and acyclic, x and y are its sentential forms, 
and inffcr, y] does not exist, then for every 

F 

x'eSFCG) such, that x=>x* Jnfjy,y} does not exist 

G 

too. ■ 

This allows, while access with substantial part y 
processing, to eliminate from search every subtree of tree 
which root x is such, that infix, y} does not exist, 
because due to theorem 3 and feature 4 ofT(Vl r j it is clear, 
that all terminal nodes of the subtree (i.e. facts u-’ G W such, 
F 

that x=$w) infix, w] does not exist too. The last is equivalent 

c 

to false of condition y => w, which is necessary for inclusion 
of fact w E W to the answer. 

Accumulation of the answer to the query by SF-tree search 
(“SDB index navigation”) is implemented by recursive 
function SRCHK with two arguments, first of which is y 
(substantial part of the query) and the second one is x$ (root 
of the searched subtree). Function SRCHK returns set of 
facts, entering W, which are located at terminal nodes of the 
mentioned subtree and are derived from x z . Here evident 

equality inf{y, x^} = .r-, if y =?■ x 2 , is used. In the SRCHK 
text body variable X® is the set of x z sons. 

SRCHK: function (;y, x z ) returns (); 

fl = [0} ; /^initial value of a accumulating variable */ 
if x z is terminal node 
then a '= [x : }; 
else do x r G X z ; 

if 3 infly, x T } 
then g:U SRCHK(y 7 , x'j; 
end x r ; 
return (o); 
end SRCHK 


Application of this function to SDB W and query y is 
implemented by call SRCHK(y, c E j, where, remember, a® 
(axiom of the CF grammar G with scheme D ) is root of the 
tree 

Example 12. Let 

W t = {AREA L ONEL Y TREES NORMAL AT 12.00. 

AREA LONELY TREES NORMAL AT 13.30, 

AREA GREEN FOREST SMOKED A T 14.5 0 , 
AREA GREEN FOREST SMOKED AT 15.30} 

and metadatabase D from the example 2. Possible SF-tree 
t(W) may be shown at Fig.2. 



Fig.2. SF-tree "(VF) 


Let y = AREA < namz of art a > SMOKED AT 

< time. > (purpose of the query is to get all information about 
smoked areas). The search is performed as follows: 

(step 1 ) iufty, x i} does not exist (so U 4 and being the 
sons of Xi are not processed); 

(step 2 ) inf{y, x 2 } exists (so subtree with root x 2 is 
processed); 

(step 3) u-’ g ] exists (fact is included to the 

answer); 

(step 4) inf{y, u- 4 } exists (fact is included to the answer 
as well). ■ 

As seen from the example 12, both direct search and search 
by the SDB index navigation are executed by four steps, but 
with the growth of the database volume the computational 
complexity of the navigation runs to the logarithmic area (of 
course, if r(W) “corresponds” to queries structure [28]). 

Generalization of SF-trees to the BDI case is trivial - 
N-facts (being sentential forms of CF-grammar, which 
scheme is metadatabase) in terminal nodes of the tree are 
permitted. Essentially, that if we follow M-semantics (34), no 
corrections to the SRCHK text are needed. If we use (35), 
then the assignment operator “then cl := {x E }” is replaced by 
“then &: finf[y,x E }}”. 

Consideration of SF-trees correction while facts (N-facts) 
inclusion and deletion, as well as of the variety of these trees 
modifications (ASF-trees, multidimensional SF- and 
ASF-trees etc.) developed to compress the volume of the 
stored data and to minimize computational complexity of the 
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search are out of the presented work coverage; the interested case, i.e. m = 0. S-production with the empty conditions set, 

reader may find corresponding content in [27], [28]. looking like 


VII. AUGMENTED POST SYSTEMS AND APS 
INCOMPLETE INFORMATION MODELING 

Described elements of DBI modeling illustrate new 
opportunities, which are not available by classical relational 
and post-relational approaches. Much more opportunities are 
provided by the SSF basic knowledge representation model - 
mentioned above augmented Post systems, various of 
modifications of which provide OLAP of the incomplete 
information as well as Data Mining [25], [27], [28]. 

Most efficient knowledge-based systems (KBS), 
implemented within the SSF, are operating SDB with fixed 
CF grammars: for example, mentioned higher SDB, which 
elements (facts) have the form of variable-length pages 
accessible by their first strings, that allows common hypertext 
information representation and provides fast information 
processing because of no necessity of arbitrary CF grammar 
parsing of the entering flow. Up-to-date APS KBS are able to 
operate efficiently in many practical applications one of the 
most useful being DPI/DPP mentioned in the introduction. 

Let us consider main elements of the APS knowledge 
representation. 

Augmented Post systems were constructed from two 
predecessors - Post systems and formal grammars - in order to 
obtain formalism which, first of all, would be deeply 
integrated with set-of-strings database representation, and 
secondly, would possess deductive features comparable with 
Prolog and various relational model deductive extensions, 
providing selection of not only explicit data, but also data, 
which may be obtained from the last by logical inference. 

The augmented Post system, or APS-represented 
knowledge base (for short, APS KB), P is couple < 5 , D >, 
where D is metadatabase in the sense of (18)-(19), while S is 
a set of so-called augmented, or string (S-), productions. 
S-production u =< q f d > is couple, the first component of 
which, named body, has the form 

m , (45) 

where m> 0 and s lf ... , s m are Post terms (for short, 
“terms” lower), i.e. non-empty strings of symbols of the 
terminal alphabet I T of the MDB D and variables. Universum 
of variables is denoted below as F, and for i = 0,1, ... , m, 

6 (F un + . (46) 

Term s Ci is called header while terms , s m are called 
conditions. Set of conditions is unordered, i.e. ^-productions 

< s 0 s El . s-^d > (47) 


< s c d > , (49) 

is called S-axiom. 

Component d in the S-production u =< cj, d > is called 
variables declaration and is set of generation rules of the form 

(50) 

where y E 7 is variable presenting in at least one of the terms 
s c ,£ 1 ,.,..,£ tTi , and p E (V U A 7 Y , where A : is non-terminal 
alphabet of the metadabase D . Rule (50) is called variable y 
declaration, which defines F(y) set of permitted values of 
this variable: 


V(y) = [u I 8=*u Blu e V m }, (51) 

G 

where G is CF-grammar, corresponding to metadabase D in 
sense (18). It is essential, that there is one and only one 
declaration y ft for every variable y having place in the q 
body of S-production a =<q,d >. S-production u, which 
body s-j, *- has not variables, i.e. 

£i E r, (52) 


for all i = 0,1,..., ,m, is called concrete S-production (for 
short CS-production). Evidently, in this case d = {0} . 
CS-production 

< £ 0 d >, (53) 

such, that£f} E V + f d = [0], is called CS-axiom. 

S-production tj =< q, d > defines set of CS-productions 
o 7 in the following way: 

[7 = J ==: £ c [ 6 ] ff L [£] # „. i f ra [£L{0} >, 

U ”■ U an ^ . Yk -> Ui}}, 

LilEV'fyO Hfc£V'(vii^ 


(54) 

(55) 


where is] are words in alphabet 7, which are obtained by 
the substitution of strings instead of variables 

y t - -ft having place in terms in such a way, 

that all occurrences of the variable in all terms of the q 
body are replaced by one and the same string u ,-. 
S-production u E 5 is called correct to MDB D if 

(v< €- Wh, ... , fe} >E u) 

: , 1 ^" (56) 


and 

< £ c , *- sj it ..., s Jm , d >, (48) 

where . t m } = {/i . j m } = { 1 , are identical. 

The mentioned set of conditions may be empty in general 


i.e. all terms of every CS-production from set o 7 are words of 
language L(G') , where CF grammar G corresponds to 
metadabase D in sense (18). APS KB P =< S,D > is called 
correct, if all S-productions u E 5 are correct to MDB D . 

Let us define the notion of the APS KB extensional. 

Consider correct APS KB P =< S f D >. Let 
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*TE 5 


i.e. S is a set of all CS-productions defined by all 
S-productions of this knowledge base in the sense (54) - (55). 
Define 

D = {w 1 < W [&]■ >E 5}, 

^(L + n = W& U ( [J {w 0 }), 

m E H'Vi) 


(58) 

(59) 


< fact >-* SJLYSOZ < e > ^2 < time > — < state >, 

< i >-&< symbol >< symbol >, 

< /sc/ >-* SENSOR < i > LOCATED AT AREA 

< name of area >. 

Facts like SENSOR XXAT 12.3 ft — NORMAL contain 
information about sensors, which gather and send to the 
fusion centre data about state of the atmosphere in the 
surrounding area. Facts like SENSOR XX LOCATED 
AT AREA YY contain information about sensors location. 

Consider APS knowledge base P =< S,D >, which MDB 
D was described higher, and S contains following 
^-productions: 


and APS KB P =< S,D > extensional Ex(P'), i.e. set of 
facts defined by this knowledge base, is fixed point of the 
sequence MA;-;,,... , II /;; i + p,,.,. , which is infinite in 

general case: 

Ex CP) = W M . (60) 

Evidently, due to P correctness, 

£*GP) E L(G), (61) 


a L : < .4ZZT as ATt *- 

SENSOR & AT t - s, 

SENSOR & LOCATED AT AREA a , 

[a “*< name of area >,s ~>< state >, g ->< i >, 
t ~>< time >} >, 

I < SENSOR AL AT 15.00 - NORMAL [0} >, 

(7 3 : < SENSORXFAT 12.30 - SMOKED [0] >, 

< 

SENSOR AL LO CA TED A TA REA HIGHLANDS 

M> 


and Ex{P) is finite, if their exists i such, that 

l-Vb = W ii+D . (62) 

From the linguistical point of view Ex (P) is language in 
alphabet 7, defined by APS P and being sublanguage of 
From the SSF knowledge/data engineering point of 
view Ex (jPJ is set of facts, which are either known explicitly, 
i.e. w E W: zj (they are called ground facts) and either are 

derived from ground facts and/or other derived facts. It is 
evident, that set of ground facts W[ Z \ is set-of-strings 

database, while set of S-productions with non-empty 
conditions sets is APS KB intensional providing the 
mentioned derivation. 

Now we can define semantics of queries to APS KB. 
Set-theoretical semantics of query to APS KB is similar to 

(9): 

A=WDI, (63) 

where "F is APS KB extensional, and / is set of facts, which 
actuality check is purpose of the query (for simplicity lower 
indexes £ and £ F 1 having place in (9) are omitted here). 

The simplest query language, harmonized with similar 
SDB query language considered higher, may be set of couples 
< s, 12 >, where s is term, and d is its variables declaration, 
and 

/ = Ex(< [< s d >}, D >). (64) 

Here, < {< s d >},I9 > is correct APS KB, which set 
of S-productions - selection criterion - contains one S-axiom, 
defining set of facts, which may belong to Er (p j. 

Example 13. Consider metadatabase from Example 2 
adding to it following three rules: 


u 5 : < SENSOR XF LOCATED AT A REA GREEN FOREST 

fel ■ := '. 

As seen, — u 5 are concrete S-productions, and set, 

corresponding to this APS KB, consists of ground facts being 
headers of these CS-productions. 

Query, which purpose is to get information about smoked 
areas, may be as follows: 

< AREA s SMOKED AT t\ Z naTmo f area > '] >, 

L £ -*< time > - 

while query, which purpose is to get information about sensor 
XF location, may look like 

< SENSOR XF LOCATED AT AREA v, 

{v -*< name of area >} >. 

Evidently, this knowledge base extensional is 

{AREA HIGHLANDS NORMAL AT 1 5 . 00 : 

AREi GREEN FOREST SMOKED AT 1 2 . 3 Q : 

SENSOR ALAT 15.00 - NORMAL, 

SENFOR XF AT 12,30- SMOKED, 

SENSOR L OCA TED A T AREi HIGHLANDS , 

SENSOR XF L OCA TED A TAREi GREEN FOREST}. ■ 

Expressions (57)-(64) form mathematical semantics of the 
considered APS KB simple query language. 

Kernel of this language operational semantics is so-called 
S-unification [25], [28], which place in the SSF is similar to 
well-known unification by J.Robinson used in the first order 
predicate logic resolution procedures [40]. We shall consider 
basic concept of S-unification and then describe shortly 
axiomatic system providing controlled logical inference of 
answers to APS KB queries. 
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Let < s, d > be the query and < £;, di > is couple formed 
from S-production 

0|= < si *- £{,... ,£^, I d- L >. (65) 


As may be seen easily, answer to < s, d > may contain 
headers of the CS-productions being concretizations of u,;, i.e. 
belonging to set fff, if set 


' 1 7 ■" = Ex( < {< s d >}.. D >J n 

Ex(< {< si d { >},.D >) ^ f0}. 


( 66 ) 


Let us suppose for simplicity, that term £ has no more than 
one occurrence of every variable; just the same let us suppose 
to term £;,„ Lower, inf denotation relates to CF-grammar G 
corresponding to metadabase D of APS KB P =< S f D >. 

Lemma 2. += [0} if and only SF 

x = inf{£[c] ,£^ 12 ,]} exists. 

Proof. If there exists fact u- £ , then in turn exists x. If 

x does exist then ^ [0]. Both run out from correctness 
of the APS KB.B 

It is evident, that SF x corresponds to new, more precise 
(informative, concrete) S-production u,- variables declaration 

d,-, which is constructed as follows: 





l 

fJi. L + 



u i u m[ ^m. L m . L + 1 


UU 2 . . . H p- . U 1^.+ 1 


(67) 


and 


a 



{y'i -* Pi>->rki 

< V.Ni.a^Di > 


m; 



Di = D U 


<im 




( 68 ) 

(69) 

(70) 

(71) 


As seen, G is CF grammar, obtained from < V f N,a 0 ,D > 
by variables , having place in term £^, joining to the 

non-terminal set A r , and generating rules 
W fit* - 7m[ fili[ joining to the scheme D of the 

grammar G. Because of unique occurrences of variables in 
term si they are non-terminals of G t - in strict sense of this 

notion. As a result, d t is obtained by replacing si variables 
declarations by new ones, which are more concrete (precise, 
informative) in comparison with initial declarations having 
place in d { , and, that is very important, this concretization 
corresponds to query < £. d > in such a way, that set of 
CS-productions, which headers may belong to the answer to 
this query, may be constructed at the following steps of 
inference. 

Sense of the operation described is illustrated by Fig. 3 and 
Example 13. 


Yi 


7/ 



inf{^],4K]) 


Fig. 3. Graphical illustration of S-unification. 

Example 13. Consider query < £, d >, where 

£ = AREA & NORMAL AT l, 
d = {i? -*< fiams of area >., t -* < lime >}, 

and S-production cq from the example 13, where 
£n = AREA a sAT t, 

d i = {& -* < name of area >,£->< state >, & -*< i >,. 

£ -*< time >}. 

According to (67) - (71), 

si [dj = AREA < name, of area >< state > AT < time >, 
£[d] =AJtEA < name of area LJORTLALAJ z ~. time .- z ~, 

di - d 1 - [s ->< state >} U [s -* NORMAL} = 

= [a -*< name of area >, £ -* NORMAL, e -*< f >, 
t -*< time >}.■ 

S-unification provides opportunity of answers derivation 
by controlled logical inference, which is implemented by the 
axiomatic system, which contains only three inference rules - 
top-down successful S-resolution, top-down unsuccessful 
S-resolution and bottom-up S-resolution [28]. While 
inference processb waves of new queries with more and more 
precise variables declarations are generated until they reach 
S-axioms; after that CS-productions are formed, and back 
waves start until another facts set is assembled. These waves 
“meet and interfere”, so inference process is bidirectional. 
Main difference between well-known first order predicate 
logic resolution procedures and APS axiomatics is that the 
last provides not existential (i.e. anyone fact of answer set), 
but universal inference mode (all facts needed) in full 
accordance with S- and M-semantics defined by (63) - (71). 
All the details, from the multiple variables occurrences case to 
procedural connection to APS KB providing calls of various 
software modules and complexes (up to DBMS and various 
external online data sources drivers) while answer derivation, 
concerned reader may find at [25], [28], as well as various 
APS flow-oriented dialects constructed for the specific 
classes of problems efficient handling. The last are hard real 
time network-centric systems implementation and 
application, multiagent systemw design, programming 
languages syntax and semantics strict formal definition, 
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natural languages processing, computer-aided design, 
technical systems simulation etc. 

Basic tool of the procedural connection to APS KB are so 
called program (P-) productions of the form < ^ > - 

where s E and d have the same sense as higher, being 
declarative component of the connection, while p is 
connected program name, and symbol " <= " is divider 
distinct from " *- ". Each program p has its own extensional 
X Q £(G), which is subset of Fx(< {< s ^- f d >}, D >). 

Example 14. Program named MULT for multiplication of 
integer numbers may be connected to knowledge base by 
P-production 

< c * b = c <= MULT, { a -* < 
integer >, fa -* < integer >, c -* < 
integer >} > 

MULT extensional is infinite set containing strings like 
1*1 = 1, 0*5=0, 311 * -1 = -311 etc. ■ 

Programs may be time-dependent (with dynamic 
extensionals) and time-invariant (with static extensionals, as 
in the example above). The first are various DBMS and 
hardware drivers (usually sensors and actors in 
network-centric environments), while the last are various 
operations on data (numerical, graphical, textual etc.) usually. 

There is simple tool for logical inference control within 
APS knowledge representation. Namely, variables, which 
declarations after S-unification must have right parts, 
consisting only of terminal symbols (i.e. there is no 
incomplete information, associated with non-terminals, inside 
these declarations), are marked by point over arrow in the 
initial descriptions having place in P- and S-productions. That 
concerns variables & and b in the example above, which 
declarations would be a -A < integer >, b A < integer >, so 
query 

< c * & = f*{c- * 1 , e 4, / < integer >} > 

processing while inference will result in program MULT call, 
while query 

< x * a = s, 

[y -+ < integer >, a -* 135, s ->< integer >} > 

processing will not because of information incompleteness of 
the declaration of x variable. 

Coming back to main object of this article, let us consider 
APS KB incomplete information modeling techniques. It is 
quite simple and fully corresponds to the described higher in 
application to SDB as well as to already developed for 
relational databases with Datalog-like extensions [17]. 

S-production 

0 — ... f s mf d > (72) 

is called complete, if its header s E variables set denoted ["(s^ 
is subset of set of variables having place in the conditions 

-■ i ■■ ■ * • 

fOu) — (23) 


where term^—is constructed by concatenation of 
terms s v ... f s m . 

On the contrary, if 

rts c J — (si, ..., £0}, (74) 

S-production (72) is incomplete. 

Incomplete S-axioms < d > such, that 
* 0 ME5Ffc)-r, i.e. header s E contains at least one 
variable y such, that y -* $ E d and ft G SF(.GJ — F 4 , 
naturally correspond to N-facts considered higher. This 
allows to transfer from set-of-strings databases with 
incomplete information to similar APS knowledge bases, and 
to generalize SDB I handling techniques to APS KB with 
incomplete S-productions. On the other hand, shortly 
described higher SDB search techniques is directly and 
efficiently applicable to APS KB inference engines 
implementations, providing minimization of the S- and 
P-productions headers redundant search and thus minimizing 
inference computational complexity. 

VIII. CONCLUSION 

As may be seen, the described approach provides 
convergence of very close but still stand-alone areas: theory 
of formal languages, information theory and knowledge/data 
engineering. Synergy, obtained by this convergence, may be 
very useful for creation of the unified theory so necessary for 
practical problems solving. 

The most interesting directions of the described approach 
future development are, to our opinion, the following: 

1) information concretization in SDBI and APS KBS 
with the help of functional dependencies known from the 
relational databases theory [11,12,20]; 

2) contradictory SDBI and APS KBS management; 

3) SDBI and APS KBS with metadatabases, describing 
two and three-dimensional objects; 

4) SSL ideology and techniques transfer to the numeric 
data with the help of the SSL-associated theory of 
recursive multisets and optimizing multiset 
grammars/metagrammars [26]; 

5) hardware SSL implementation. 

The author is ready to cooperate with all scholars and 
engineers who will be interested in the listed directions as well 
as in the SSL at all. 
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